Project Summary

Project summary

Goals

  • Test the feasibility of analyzing location-based social media data to address questions about the scale and context of people’s sense of place
  • Conduct at least one of these analyses for Santa Barbara
  • Produce a research plan and objectives for a larger project, informed by the results of this proof of concept project, as a way to gauge interest and feasibility of pursuing additional phases of the project.

Proposed approach

  • Examine data availability and accessibility from social media platforms
  • Decide on two possible geographies of interest, informed by data availability
  • Extract geotagged data for each geography over a determined period of time. This is dependent on data availability but ideally we’d have data over a few years.
  • Apply existing methods to reveal patterns of sense of place
  • Analyze data for potential drivers (i.e., variables from the local context that reveal why a place is important) and interactions (e.g., connections between places)
  • Communicate results through a blog post and across social media platforms
  • Produce a proposal for next steps if this Proof of Concept proves a larger, more robust project is feasible

Analysis summary

  • Roughly 80,000 geotagged tweets were collected for Santa Barbara from January 1, 2015 to December 31, 2019
  • Tourists and locals were defined using a two-step process
  • Nature-based tweets were identified using text detection

Findings

  • Tourists tweet more about nature than locals
  • spatial patterns on use

Takeaways

  • Social media data is difficult to access for academic research. This data was only available through a UCSB Library partnership with Crimson Hexagon, but still required significant time investment to retrieve and get in a usable format
  • We are not pursuing a larger project, but have identified multiple avenues for further research. Project findings will be published in a blog post and shared broadly via social media.

Analysis

Data Overview

Twitter data was obtained freely through a partnership between UCSB Library and Crimson Hexagon. Before downloading, the data was queried to meet the following conditions:

  1. Tweet came from the Santa Barbara area
  2. Only original tweets (no retweets)
  3. Date was marked between January 1, 2015 and December 31, 2019

Acessing Data

Crimson Hexagon only allows 10,000 randomly selected tweets to be exported, manually, at a time in .xls format. Due to this restriction, data was manually downloaded for every 2 days in order to capture all tweets. There were around 5000 average number of daily tweets that met these conditions.

The Crimson Hexagon data did not contain all desired information, including whether or not the tweet was geotagged. To get this information we used the python twarc library to “rehydrate” the data using individual tweet ids and store the tweet information as .json files. From here we were able to remove all tweets that did not have a geotag, giving us a total of 79,981 tweets.

Table of data

Here is a sample of the type of the final twitter information we obtained.

Month Day Time Year full_text user_location retweet_count favorite_count month_num date
Apr 16 04:18:48 2018 Congratulations to all of the Spirit of Fiesta participants! PC: Fritz Olenberger We can’t wait… https://t.co/t6cjRHxgmD Santa Barbara, CA 0 0 4 2018-04-16
Jun 4 05:55:33 2016 That’s dessert @ Stonehouse at San Ysidro Ranch https://t.co/n2brT1oJ3M South Florida 0 0 6 2016-06-04
Nov 15 17:31:00 2019 It’s been 5 years since Brandon and I made the adventurous move to #santabarbara to open our own business together. We couldn’t be more happy and proud of our first bar @thegoodlionbar and living in this beautiful… https://t.co/mbgENdrH3y Santa Barbara, CA 1 3 11 2019-11-15
Feb 7 02:34:11 2016 That guy at the top of the key? His name is Caleb and he scored the winning shot. First place… https://t.co/T8wZG7YCRF Santa Barbara, California 0 0 2 2016-02-07
Sep 27 01:31:58 2018 These shoes were made for …..sitting on the curb and taking photos of them 🌾 @saboskirt #saboskirt @ Santa Barbara, California https://t.co/s3FW8rj84V Santa Barbara, CA 0 0 9 2018-09-27
Jan 14 21:58:32 2018 North County Fire District (Monterey County) parked at a taqueria #mutualaid https://t.co/YtAkjvKR6W Seattle → Bangkok → SoCal 0 0 1 2018-01-14
May 28 20:06:02 2017 ⛪️ @ The Historic Old Mission Santa Barbara https://t.co/xtlHZPa06v NA 0 0 5 2017-05-28

Given that the tweet dataset is queried to just those that are geotagged - I hypothesize that most of these tweets have a picture or a link to an instagram post. We can detect links by looking for “t.co” in the tweet which is a twitter URL for a separate webpage. These are often twitter or instagram photos but we can’t be 100% certain.

It looks like 93% of geotagged tweets contain a link or picture.

Map of twitter data

The spatial distribution of tweets highlights areas of higher population density and tourist areas in downtown Santa Barbara. There is a single coordinate that has over 11,000 tweets reported across all years. It is near De La Vina between Islay and Valerio. There is nothing remarkable about this site so I assume it is the default coordinate when people tag “Santa Barbara” generally. The coordinate is 34.4258, -119.714.

As you zoom in on the map, clusters will disaggregate. You can click on blue points to see the tweet.

Zoom in to see where tweets are located. Hover over each point to see full text of tweet

Tweet density

Each hexagon shows the log10 density of tweets in that area. The highest number of tweets in a single location is around 11,000 (yellow hex). This includes the default Santa Barbara coordinate used for geotagging from the city of Santa Barbara without a precise location

Timeline of tweets

The number of geotagged tweets is going down over time. There is a significant drop in tweets at the end of April, 2015. It seems this is due “a change in Twitter’s ‘post Tweet’ user-interface design results in fewer Tweets being geo-tagged” ( source). The first 4 months of 2015 have 15,720 tweets, or roughly 19% of all tweets. To reduce a skew in the data and remove geotagged tweets that may have been geotagged without knowledge by the user in those months, we moved forward with all tweets from May 1, 2015 through the end of 2019.

Tourists and locals

This project aims to understand if and how preferences differ between tourists and locals for nature-based places within the Santa Barbara area. In order to test this we needed to come up with a way to identify tourists or locals. We used a two step process.

First, if the user has self-identified their location as somewhere in the Santa Barbara area, they are designated a local. This includes Carpinteria, Santa Barbara, Montecito, Goleta, Gaviota and UCSB. For the remainder, we use the number of times they have tweeted from Santa Barbara within a year to designate user type. If someone has tweeted across more than 2 months in the same year from Santa Barbara, they are identified as a local. This is consistent with how Eric Fischer determined tourists in his work. This is not fool-proof and there are instances were people visit and tweet from Santa Barbara more than two months a year, especially if they are visiting family or live within a couple hours driving distance.

There are 21811 tweets from tourists and 45420 tweets from locals.

The following map shows areas that have more tweets from locals (orange) or tourists (purple). Note the values indicate the log10 of the absolute difference between number of tweets from each user group. If a hex is purple and has a value of 2, this means there are 100 times more tweets from tourists than locals at that location.

Nature-based tweets

The full text of each tweet was analyzed to be either nature-based or not. We developed a coarse dictionary of words that indicate a nature-based tweet. These include natural features like ocean, coast, park, and works that indicate recreating (fishing, hiking, camping, etc.).

Note
I had a hard time finding an ontology or lexicon that would fit this project. These are definitely skewed more towards nature and recreation rather than words like “home” or “connection”.

##  [1] "hike"        "trail"       "hiking"      "camping"     "tent"       
##  [6] "climb"       "summit"      "fishing"     "sail"        "sailing"    
## [11] "boat"        "boating"     "ship"        "cruise"      "cruising"   
## [16] "bike"        "biking"      "dive"        "diving"      "surf"       
## [21] "surfing"     "paddle"      "swim"        "ocean"       "beach"      
## [26] "[^a-z]sea"   "sand"        "coast"       "island"      "wave"       
## [31] "fish"        "whale"       "dolphin"     "pacific"     "crab"       
## [36] "lobster"     "water"       "shore"       "marine"      "seawater"   
## [41] "lagoon"      "slough"      "saltwater"   "underwater"  "tide"       
## [46] "aquatic"     "[^a-z]tree"  "[^a-z]earth" "weather"     "sunset"     
## [51] "sunrise"     "[^a-z]sun"   "climate"     "park"        "wildlife"   
## [56] "[^a-z]view"  "habitat"     "[^a-z]rock"  "nature"      "mountains"  
## [61] "[^a-z]peak"  "canyon"      "pier"        "wharf"       "environment"
## [66] "ecosystem"   "flower"

Let’s look at some examples of what tweets qualified as “nature-based”.

date full_text user_location user_type nature_word
2015-10-20 just posted a photo @ hendry’s beach https://t.co/pmut0teuuy Santa Barbara, CA local 1
2015-08-18

beachy evenings in giant sweaters.

loving all this beach time lately! pc @isarahstiteler @ santa… https://t.co/8z70haeao0
Los Angeles, CA tourist 1
2015-06-04 me the coolest pooch in santa barbara yesterday. his name is carl and he has his own custom made bike… https://t.co/ur7hflbhwm NA tourist 1
2016-02-24 nah nah nah nah nah nah nah nah…. batman! life at the #beach @ boathouse at hendry’s beach https://t.co/a6qabg4sna Santa Ynez, CA tourist 1
2016-11-04 nantucket scallop season has opened! we’re expecting our first of the little jewels at sly’s… https://t.co/1yxdztefyr Carpinteria, California, USA local 1
2018-02-09 benefit of #workingfromhome this is how i lunch :-) …at my desk, solo :-( a local #watercooler… https://t.co/qbsrecxgso Santa Barbara, CA USA local 1
2015-08-14 " i spoke to her of style, of an army of words..no iron spike can pierce a human heart as icily as a period in the right place" babel NA tourist 1

Changes over time

All groups show increases in proportion of tweets that are nature based over time.

Where are nature-based tweets?

After identifying nature-based tweets we can take a look at where these tweets are coming from and compare to the general pattern of tweets.

Who is tweeting nature-based tweets?

Not surprisingly there are less nature-based tweets than nature-based 24% of all geo-tagged tweets are nature-based.

Of local tweeters, 21% of tweets are nature-based. Of tourists, 30% are nature-based.

Are tweets in protected areas more often nature-based?

California Protected Areas Database

We can use the CPAD data to identify protected areas. [expandon CPAD here]

Compare occurrence of nature vs non-nature based tweets

The highest ratio of nature tweets to non-nature takes place at Lookout Park and Beach.

We can look at the top 20 most popular tweeted-from sites. The green highlighted portion represents nature-based tweets. The number indicates what percentage of all tweets are nature-based at each site. Names in bold indicate over 50% of tweets are nature-based.

The Santa Barbara Harbor has the most number of tweets, followed by Manning Park. Interestingly, Manning Park seems to contain a default coordinate for the “Montecito, California” geotag.

How does this differ across tourists and locals?

Looking at the breakdown between tourists and locals. The sites included here have at least 50 tweets total across the time frame.


Sentiment Analysis

We can apply a sentiment analysis to the twitter data to try and understand patterns and trends in the general sentiment of tweets.

The top graph shows the total number of geotagged tweets, which has gone down over time across tourists and locals.

The bottom graph shows average daily sentiment scores over time. Above 0 is positive, below 0 is negative. We see that tweets are mostly positive and growing over time.

Changes over time

All tweets

Tourists

Locals

Word clouds

Top 100 words across all Santa Barbara geotagged tweets

ideas

We could do this also just for nature-based tweets and within/outside of CPAD areas.


Lessons learned

Data is harder to find

Future research

Assign each designated area to a location like coastal, urban, foothill, mountain and see if we see interesting trends across those. Expect - more tweets in coastal, urban. Maybe coastal and mountains have more nature-based vs urban and foothill.

Looking at different scale areas

There might be an interesting comparison between rural-suburban-urban areas. We hypothseize that the tourist/local alignment would split in urban areas, maybe aligned in suburban (like SB) and maybe not exist in rural.

Proportion of words that are nature based tells you how people. In Santa Barbara, there will be a lot of nature-based sense of place. In Manhattan, we wouldn’t expect to see nature based ones so much.

In a blog piece we can pose questions that we couldn’t answer but stuff like “can proportion of tourists/locals in place engagement tell us anything”.

Could compare % nature based tweets in SB to other areas. If we did this across the whole state, what proportion% are nature based? Maybe on average its just 5%.

Where and why do locals and tourists overlap in their use of area. SB seems to have a high alignment of tourists/locals, which may be helpful for local policy. Maybe places with distinct differences in how tourists/locals use places.

Look at cities of different coastal sizes rural - small town - urban - mega city. Could see how tourists/locals patterns differentiate across scale.

Is there a threshold of tourists where locals don’t go anymore?

In areas where we see both tourists and locals engaging, what characteristics do we see?

Quantifying transitions between rural to city.

Talk about overall social media literature for conservation and how this project is similar/different and used lessons from those papers to guide this analysis.

  • most of the lit is used to look at tourist preferences

Ideas from hacky hour

“Do tourists and locals interact with nature differently?”

Hex assessment for % nature tweets

Could look at seasons. Do we see terms associated with fire/drought/rain?

Do the sentiment time analysis by positives only then negatives only to help see if the thomas fire or debris flow are showing up

Map of most popular words in a location?

Static dot map of tourists/locals like Eric Fischer

Appendix

Additional figures to supplement the analysis.

If we just look at proportion of nature-based tweets we see a different ordering. I removed any places with just 1 tweet since it will skew results if that tweet happens to be nature-based (a total of 4 places).

What sites have no nature-based tweets?

This chart shows all CPAD areas and the proportion of tweets that are nature-based. The total number of tweets is represented by the width of the line.

What areas of Santa Barbara have over 50% nature-based tweets but are not within a designated CPAD area?

The idea here is to use the data to identify places where the majority of tweet content is nature-based but it does not align within a designated area. This could be used to indicate places that maybe should be recognized or protected but currently aren’t.

The top 10 positive and negative words found across all tweets. There are many more instances of the positive words than negative.

We see that genearlly “joy” and “positive” are the types of tweets we see most.